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Introduction. 

This paper has two main themes. It tries to model early vision processes 
in terms of minimizing energy functions. Secondly it examines methods of 
minimizing these functions, in particular analog style networks. 

The first section gives some background on the use of energy functions 
and neural networks in vision. The next few sections describe work done 
by the author and collaborators on a number of vision problems. The final 
sections discusses limitations to this approach. 

1. Energy Functions and Networks. 

It is convenient to divide vision up into two stages. In the first stage the 
visual scene is analysed, segmented, and properties such as depth, colour and 
texture are extracted. In the second stage objects are recognized and high 
level information is used. The output of the first stage is a representation of 
the scene in terms of depth values, colour and so on. This representation can 
be called a 2 — 1/2 D sketch (Marr 1982) or an Intrinsic Image (Barrow and 
Tennenbaum 1981). It is generally assumed (Marr 1982, Horn 1986, UUman 
1979) that the construction of such a representation does not involve any 
knowledge of the world (or of the task being performed) more sophisticated 
than low level assumptions, such as the rigidity of objects. This representation 
is produced by a number of independent modules, such as stereo, structure 



from motion, shape from shading. This paper will confine itself entirely to 
the modules of early vision. 

A number of these modules have been modelled in terms of energy func- 
tions. UUman (1979) described a theory of motion involving solving the cor- 
respondence problem between image frames by minimizing a cost function. 
Ikeuchi and Horn (1981) describe a theory of shape from shading using a 
variational principle and Ikeuchi (1980) tises a similar technique for shape 
from texture. Ikeuchi (1980) describes how this technique is able to impose 
smoothness constraints on the object and draws the analogy with imposing 
constraints in Artificial Intelligence. These methods can be illustrated by 
work on optical flow by Horn and Schunk (1981). Let the brightness function 
be I(xi , X2,t). Then, assuming that points with the same image intensity over 
time correspond, the velocity field (1^1,^2) obeys 



f"'-f = » M 



where we use the summation convention over repeated indices (for example 
a,-6i = ai&i -|- 0262)- Now (1.1) is a single equation for the two unknowns 
(^1,^2) and does not specify them uniquely. To obtain a unique solution Horn 
and Schunk assvmie continuity of the velocity field and minimize a function 



Here the first term corresponds to reqmring smoothness of the velocity 
field and the second to enforcing (1.1) . They define an iterative algorithm to 
minimize (1.2) and obtain good results. 

Many other vision problems have been treated in a similar way and we 
mention a few examples. Hildreth (1984) used a similar method to solve the 
aperttire problem for motion based on zero crosssing contours. Crimson (1981) 
uses a similar approach to interpolate a surface through sparse stereo data. 
Terzopoulos (1984) extended Crimson's work using more sophisticated tech- 
niques. Poggio and Torre (1984) descovered the similarity of these methods 
to a branch of mathematics called regularization theory (Tikhonov 1977) and 
proposed a unified framework. These methods all had an important property 
that was both a weakness and a strength; they usually imposed continuous 
solutions and smoothed over discontinuities. Regularization theory (Poggio 
and Torre 1984) indeed required that the solution to a problem depended 
smoothly on the data. Inserting a discontinuity in the solution would reqviire 
a yes/no decision, and hence could not depend continuously on the data. * 

* Terzopoulos (1984) suggested the surface could be interpolated smoothly and 
then the boundaries found by an edge detection operation measuring the "tension" 
in the smoothed surface. 



This inability to deal with discontinuities however had important practical ad- 
vantages, the energy functions tended to be convex and not have local minima. 
Thus they covild be minimized by simple methods such as gradient descent. 
More sophisticated techniques could be used to speed up the convergence. For 
example, Terzopovdos (1984) adapted a mtdti-layer algorithm due to Brandt 
(1977). 

To deal with discontinuities a new approach was needed. Geman and Ge- 
man (1984) did work on image segmentation using line processors. These are 
illustrated in figure 1. There are two lattices, the standard space lattice and 
an additional line processor lattice. The line processor elements are either on 
or off. When a line processor is on it breaks the constraints between the ad- 
jacent space pixels. Similar work was reported by Blake (1983) who used the 
idea of weak constraints * , that is to say constraints which must be satisfied 
almost everywhere but which can be broken at a cost. The binary nature of 
the line processors means that discontinuities can be dealt with. However the 
energy functions are no longer convex and new strategies are needed to mini- 
mize them. Various methods have been tried. Geman and Geman (1984) use 
simulated anneahng while Blake (1983) uses a method which systematically 
approximates the energy function by a convex one, graduated non- convexity.. 



Based on work by Hinton (1979). 



Marroquin, (1985) Marroquin et al (1987) interpret the energy functions in 
terms of probability theory using the Clifford Hamersley theorem, a connec- 
tion described in Geman and Geman (1984). Instead of minimizing the energy 
function they use probabilistic algorithms to estimate the maximum a pos- 
teriori Bayesian estimate of the solution. Apart from simulated annealing, 
which takes a long time, none of these methods are guaranteed to converge 
to the correct result. 

In this paper we will describe an alternative approach to minimizing 
these energy functions based on analog networks of the Hopfield type. Math- 
ematically this gives a method of smoothing the energy function reducing the 
number of local minima. It is also implemented by a network which covild be 
built in V.L.S.I. and which could possibly be implemented by real neurons. 
The V.L.S.I. network would be massively parallel and could minimize the en- 
ergy function orders of magnitude faster than serial, or parallel computers. 
Hopfield networks were originally designed to be an associative parallel mem- 
ory (1982, 1984). In Appendix (1) we give a simple introduction to Hopfield 
networks and then show how their formalism can be extended to allow some 
generalizations. Although the Hopfield networks are nonlinear we can usu- 
ally write analytic closed forms for the solutions they converge to. There is 
empirical evidence that they often converge to the correct result. Moreover 
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we can prove mathematically that they will always converge to a solution of 
the mean field theory equations and thus represent a deterministic method 
to approach the probabilistic solutions (see also Marroquin 1987). There are 
also similarities with the graduated non-convexity approach of Blake (1983). 

Many vision algorithms were designed to be implemented on neuronally 
plausible networks. For example Horn's work on colour (1973) was partially 
intended as a possible model of the human colour system. Cooperative stereo 
algorithms by Arbib and Dev (1975) and Marr and Poggio (1977) were also 
implemented by simple neural-like elements. * Terzopoulos (1984) suggested 
the use of analog networks for surface interpolation, but did not implement 
them. Poggio et al (1985) developed analog networks for quadratic regulariza- 
tion energy functions. They note that, supposing we are considering motion 
smoothing, minimizing the energy function E(vi,dvidxj) given by (1.2) is 
equivalent to solving the associated Eviler Lagrange equations (Cotirant and 
Hilbert 1953) 



* Interestingly the Marr and Poggio network can be considered as of discrete Hop- 
field network. Discrete Hopfield networks, however, are less effective than con- 
tinuous networks for minimizing functions since the continuous networks smooth 
the energy function removing local minima. 



^^ = ±.?l, (1.3) 

d{dvi/dxj) dxj dvi 

Since E{vi^ dvidxj) is quadratic in Vi,dvidxj the equations (1.3) are linear 
in Vi and its derivatives. Any system of linear equations can be modelled by 
an analog network involving resistances, capacitors and inductances (Kaplus 
1958) * and hence (1.3) can be solved for such a network. 

Hopfield style networks could also be adapted to minimize (1.2) and it 
is interesting to consider the differences with the networks described above. 
Hopfield networks have a dynamical update rule, which for this problem cor- 
responds directly to a continuous form of steepest descent, 

at dvi ^ ' 

It follows from the chain rule of differentiation that the energy function 

E{i) will continuously decrease with time 

dt dvi dvi ~ ' \ • ) 

E is bounded below, by 0, and so with this dynamics the system has to 
converge to a minimum of E. Thus E is & Lyaponov function. 

The analog networks of Poggio et al (1985), and the networks of Hopfield, 



''Special tricks are needed to get negative resistors 



can all be constructed out of simple electronic elements (resistors, operational 
amplifiers, etc). They are also compatible with the existing knowledge of the 
electrical behaviour of the dendrites and axons of neurons. Thus neural hard- 
ware would be capable of implementing such networks, although the evidence 
suggests that neurons are considerably more complicated. 



2. Surface Interpolation 

Surface interpolation is a good example for illustrating the difference 
between energy functions requiring smoothness and those allowing disconti- 
nuities. Following the work of Geman and Geman for image restoration, an 
energy function for surface interpolation can be written (Marroquin 1985), * 

E(f, I) = E(/' - /'+0'(1 - U) + Ca Y^ifi - dif + C, J] /.-. (2.1) 

i i i 

Here the c?,- correspond to the depth data, the /,• to the desired answer and 
the li to the line process elements. Cd and Ci are constants. The line process 

* For simplicity of the mathematics we write the energy function in one dimension. 
All the results described also hold for two-dimensional surfaces. 
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lattice is interposed with the space lattice, see figure 1. The /,• can take values 
or 1. If the /,• are all set equal to zero then E reduces to standard surface 
interpolation with a membrane * . This formulation allows the smoothness 
constraint, enforced by the first term of (2.1), to be broken at a cost. If the 
surface gradient, /,■ — fi+i, becomes too large the line process term /,■ can 
switch on at a cost of Ci in the third term. See figure 1. 

This energy function can be generalized to two dimensions in a straight- 
forward manner (Marroquin 1985, Blake 1983, Koch, Marroquin and Yuille 
1986). Additional terms can be added to the energy function to impose con- 
tinuity of lines. The interpolation can be generalized from membranes to thin 
plates by including terms with second order derivatives. This will require in- 
troducing surface gradient processors corresponding to discontinuities in the 
surface gradient. 

There has been much work on surface interpolation and several peo- 
ple have used energy functions of form (2.1). Different strategies were used 
to minimize (2.1), Blake uses a technique called graduated non-convexity in 
which a non-convex energy function is gradually transfomed into a convex 
one. This method is not guaranteed to find the global minima of the original 
energy function and may only work for dense data, nethertheless it gives good 

* Membranes and thin plates correspond to minimizing the first and second deriva- 
tives of a surface respectively. 



(a) 
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(b) 



Figure 1 The line processors. In (a) there are no line processes and the 
interpolation process assumes that the points lie on a single surface. In 
(h) the line processes switch on and break the surface where the gradient 
is large. 

empirical results. Marroquin used simulated aimealing and a nvunber of other 
statistical and deterministic algorithms. Again these were not guaranteed to 
always succeed but give good results. 
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Since there are no guaranteed methods to find the minima of (2.1) we 
would Uke a technique that may not always suceed but which is very fast. In 
a series of papers Hopfield (1982, 1984, 1985) describes networks made up of 
simple analog devices. These networks have two important properties. Firstly 
they could be implemented in V.L.S.I. and perform parallel computations at 
speeds orders of magnitude faster than existing parallel (or serial) computers. 
Secondly the networks may be biologically plausible. 

From our viewpoint these networks have a third advantage; they are able 
to find good estimates of the global minima of non-convex energy functions. 
Hopfield and Tank (1985) demonstrated that they could find close solutions 
to the Travelling salesman problem (to within a few percent of the optimum 
length) for up to thirty cities. The work described below was first reported 
in Koch, Marroquin, Yuille (1986). We now show how to design a network to 
minimize (2.1). 

The key point of the Hopfield approach is to replace the binary variables 
h by continuous variables VJ lying in the region [0, 1]. Each V; is related to an 
internal variable U,, which is unbounded, by a gain function g, Vi = g{Ui). A 
typical choice for g is 



^("') = 1 + e-2Au. • (2-2) 
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This is a sigmoid function monotonically increasing from as Uj i-+ — oo 
to 1 as Uj t— >^ oo. The parameter A controls its "sharpness". As A h-> oo it 
becomes a Heaviside (step-edge) function. The energy function can now be 
written 



.v=Vi 



E{f,V) = ^ifi-fi+,f(l-Vi)+C,J2(fi-dif+Ci^Vi+C,Y. f '9-\V)dV. 

^ Jv=o 

(2.3) 
where the last term is a gain function term. The dynamical equations are 



dUi dE ,^ . 



dfi dE 

Observe that since Ui is related to Vi by the gain ftmction (2.2) the V^* 
will always lie in [0, 1]. This type of dynamics is gradient descent for the /,-. 
If we substitute for Ui in (2.4a) it becomes 

dV- f)E 

§ = -2AV.(l-V0f (2.5) 

and so is a form of gradient descent for the Vi with a weight factor. It can 
easily be checked that £■ is a Lyaponov function for the system 
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dE _ _^(dE_.2 y^.dE.2dVi 
dt ~ ^^dfi^ ^^aV/ dUi' 



=-W4f-umm p-«) 



Since Vi is a monotonic function of Ui the right hand side of (2.6) is 
always negative. Thus E always decreases and is bounded below, so the 
system coverges to a minima. 

Koch, Marroquin and Yuille (1986) describe the implementation details. 
The system was simulated on a SymboHcs 3600 LISP machine. The system 
was tested first in 1-D and then in 2-D. Some experimentation was needed 
to find suitable values for the parameters Cd,Ci,Cg. The system performed 
well even with noisy data and with sparse sampling (sometimes as low as five 
percent of the points were sampled). 

The A parameter controls the degree of smoothing of the gain function. 
For high A the Vi are essentially forced to be either or 1 and the continuous 
system (2.3) is close to the discrete system (2.1). For small A the energy func- 
tion becomes convex (this can be verified by calculating the Hessian of (2.2)). 
Thus A corresponds to the degree of smoothing of the problem. Altering the 
value of A to change the degree of convexity of the energy function is analogous 
to graduated non-convexity (Blake 1983). For our simulations the only differ- 
ence in running the networks with small or large A was the convergence time. 
The smaller A, the longer it took to converge, without any significant effect 
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on the final solution. The convergence time of the networks was usually a few 
time constants (using the adiabatic expansion it can be shown that networks 
obeying equations (2.4) reach a final state after a few time constants). 

The fact that the network converges even with high A suggests a hybrid 
strategy to minimize (2.1). The fi are continuous and are updated by gra- 
dient descent while the Vi are discrete. The Vi are sampled at random and 
changed if this reduces the energy. Some experimental success is reported 
with this (Koch, Marroquin and Yuille 1986). For further work see Hutchin- 
son and Koch (1986), Marroquin (1985), Marroquin et al (1987). Clearly such 
a strategy will only work if the energy function (2.1) has few local minima. 

We now perform a new analysis of the network and prove results showing 
that it converges to a solution of the mean field theory equations. If we use 
a probabilistic algorithm, like the Metropolis algorithm, in the final state of 
the system each line process U will be on with a certain probability Pi{T), 
where T is the temperature. The network will converge to a state where the 
line process elements take deterministic values Pi{T) where T is inversely pro- 
portional to A. It is in this sense that the network is a solution of the mean 
field theory equations. Note that, because of the coupling with the depth 
field /,-, there will be several solutions to the mean field theory equations and 
we cannot guaurantee that we will find the one with least energy. Another 
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deterministic method, based on probabilistic considerations, of finding solu- 
tions of the mean field theory equations for this problem has been proposed 
by Marroquin (1987). 

Using (2.6) and substituting for V we see that the energy function E 
decreases at the rate 

dE _ yr^(dE.2 y-^.dE.2dUi 

The energy is bounded below so the system converges to a state with 
dEjdt = 0. From (2.7) we see that this implies (note that dUi/dVi is always 
positive and non-zero) 



dE 

aT = 0' (2-8a) 



dfi 



dE 

g^ = 0. (2.86) 



Note that 



dVi 2A 



dUi (e^i/i +e->'t;.)2 



(2.9) 



We calculate 



Wi = (eAt/..+eA^..)2 (-(/'-H - /O' + Ci + 2\C,Ui). (2.10) 
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This function has zeros at CgUi = (/i+i — fi)^ — Ci and at ±00. Cal- 
culating the second derivatives of E with respect to Ui we see that the zeros 
at ± inf are maxima and the zero at CgUi = (/i+i — fif — C/ is a minimum. 
Thus although E is not convex with respect to Ui it has a unique minimum. 
Note that none of the Vi will be exactly or 1. The true energy minima will 
therefore have 



CgUi = {fi^r-fif-Ci. (2.11) 

Recall that, for large A, the sign of Ui determines whether Vi = or 1. 
Thus a discontinuity will be imposed only if 



(/i+i -/.)'> C/. (2.12) 



The depth terms fi will obey 



(/,• - fi^,){l - V;-) + {fi - fi.,)(l - Vi-i) + ifi - di) = 0. (2.13) 

We rewrite (2.11) in terms of /,• as 



^' 1 + exp-2A((/.-+i - fif - Ci)lCg ■ ^^'^^^ 
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We have shown that if the interpolated surface gradient exceeds a thresh- 
old then a discontinuity will be inserted. It is not clear however that the 
discontinuity will be inserted at exactly the correct place. Observe, however, 
that in the absence of depth data there is no clear criterion for where the edge 
should be. 

A stochastic algorithm (for example Marroquin 1985) would use gradient 
descent for the /j, as in (2.4a), and a probabilistic update rule for the line pro- 
cessors. The line processors axe examined seperately and updated as follows: 
(i) Calculate the change in energy AE{li) resulting from changing the state 
of the hne process /,-, (ii) If AE{li) < make the change, (iii) If AE{li) > 
make the change with probability 1/(1 -|- exp—2AE/T). 

This system will converge to a state of thermal equilibrium with 



p(li = 0) = Aexpi-(fi+, - fifiT), (2.15a) 



p{li = 1) = Aexp{-CilT), (2.156) 

where A is a normalization factor to ensure p(/i = 0) -F p{li = 1) = 1. Thus 



P^^' = ^^ = TT TTf 7T2 T7F- (2.16) 

1 + exp-{{fi^i - fif - ci)/T 
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Comparing (2.16) with (2.14) we see that the network indeed finds a 
solution satisfying the mean field theory equations. 



3. Motion Smoothing and Segmentation 

We now briefly describe some work on motion smoothing and segmenta- 
tion. This work was done in collaboration with C. Koch. Mathematically it 
is very similar to the work on surface interpolation. 

The problem we are tackling involves segmenting an image using motion 
flow (with correspondence based on image intensity). For example, detecting 
an object moving over a textured background. 

A method for obtaining the motion field was described in section 1. This 
assumes continuity of the motion field and therefore would break down at the 
boundaries of the object. It is straightforward, however, to modify the energy 
function to include line processes. The line processes should switch on at the 
boundaries of the object thereby both segmenting the scene and preventing 
the velocity field being distorted by smoothing over the boundaries. Inside 
these boundaries the energy function should give the velocity field as before. 
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The energy function can be written as 






+ J2((^Ui,J - <i)' + «ij - <>)')(! - hij) + («,+x - vfjf 



hj 



+Ki+i -<i)')(i-^^i) + En^'^)- (3-1) 

The horizontal and vertical line processors are given by hij and Vij 
respectively. The V(h, v) term corresponds to the cost for the vertical and 
horizontal line processes including the terms enforcing local continuity. 

This energy function is very similar to that for surface interpolation. 
Once again the line processes produce local minima in the energy function 
and no algorithms are guaranteed to converge. 

It is straightforward to design a Hopfield network to minimize (3.1). The 
hi J and Vij become continuous variables related to new variables pij and qij 
by hjj = g(pi,j) and Vij — g{qi,j) respectively. Gain function terms for h 
and V are added to the energy. The dynamics are defined as in the previous 
section 



-rr = -^=;— (3-2a) 
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dyy 
dt 



dE 
dvv 



(3.26) 



dt 



dE 
dhij 



(3.2c) 



dt 



dE 

dViJ 



{3.2d) 




Figure 2 An example of an object moving relative to a fixed background 
This network is still being tested but preliminary results are encouraging. It 

is able to segment a textured object moving over a textured backgrovmd field. 

It can be mathematically analysed, as for surface interpolation, and similar 

results hold. 
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4 Motion correspondence 

We now consider a rather diiFerent type of vision problem namely motion 
correspondence and the recovery of structure. This work is described in detail 
in Grzywacz and Yuille (1986). 

UUman (1979) proposed dividing the structure from motion process up 
into two different stages. The first consists of matching tokens, such as points 
or straight lines, between different image frames; solving the so called cor- 
respondence problem. Once this matching is done the second stage assumes 
rigidity to recover the structure of the object. Ullmanl984 later suggested an 
alternative method of finding the structure (see also Grzywacz and Hildreth 
1987) capable of dealing with non-rigid motion. He again tised psychophysics 
to argue that the three dimensional structure of an object was not perceived 
immeadiately but developed gradually over time. He proposed that the vi- 
sual system constructed an internal model of the object, initially fiat, which 
was updated over time by assuming the minimal change of rigidity between 
sucessive image frames, the so called incremental rigidity scheme. 

It is natural to ask whether errors are caused by dividing the process 
into two stages. Both are solved using different assumptions and it is possible 
that these conflict for some stimuli. It is also interesting to see if rigidity 
alone is sviflBicient to solve the correspondence problem. To investigate this 
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we define a cost fvinction that minimizes incremental rigidity and solves the 
correspondence problem simultaneously. 

The incremental rigidity scheme maintains a model of the object at any 
given time M(t) = {xi(t), j/,(f), Zi{t)) ,i = 1, ..., N. At the next time frame, at 
t + St, the model is updated taking into account the new information available 
(we are assuming orthographic projection onto the x,y plane). The model is 
updated to M{t + 6t) = {(xi{t + 6t), yi{t + St), Zi(t + St))) , i = 1, ..., N where 
Zi(t-i-St) is determined by minimizing the change in rigidity between M(t) and 
M(t + St). The object is initially assumed flat, i.e. M(0) = ((xi(0),yi(0),0)) 
i = l,...,N. 

We first investigate using rigidity to solve the correspondence problem 
and determine the structure simultaneously. A measure of rigidity is 



Lij(t) = (Xi{t) - xj(t))^ + (yi(t) - yj(t))^ + (ziit) - 2jit))\ (4.1) 

The Zi(t + St) are defined to minimize the change in rigidity AR between 
frames 



N 



AR = Y^i(Lij(t) - Lijit + St)))\ (4.2) 

hi 

We now define a set of binary correspondence variables {Via). If point i 
in the first frame goes to point a in the second frame then Via = 1, otherwise 
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Via = 0. We can define a matching cost Er by 

N 
Er= Y, {(Lij(t) - Labit + St))fViaVjb. (4.3) 

i,j,a,b 

To find the correspondence and structure we minimize Er with respect to 
{za{t + St)) and {Via) requiring that all points in the first frame are matched 
to exactly one point in the second. 

We use a method developed by Hopfield and Tank (1985) for the Travel- 
ling Salesman Problem. Again we first define a new array of variables, (l7,o). 
These, are internal variables of the new problem and have a monotonically 
increasing relationship to Via'. 

where A is again a parameter of the problem. We next define the full energy 
function to be: 



. N N N rj N N N 

^ = I E E E ^'-^i- + f E E E "-.'''> 

o=l i=l i=i t'=l o=l !>=1 



4((tty--^)y+T^n (4-5) 



,= 1 o=l 



r^ N N 
+ ?tEE((^-M1^-) + ((1 - yia))log{(l - Via)))), 



2A . 

,= 1 a=l 



where A, B, C, D, F are positive parameters of the problem. (We will infor- 
mally identify each of the terms of the right hand side of Eq. 4.5 by the 



24 

parameter leading it.). Minimization of the A term forces each feature in the 
second frame to maintain correspondence with as few features as possible in 
the first frame, (and vice versa for the B term). Minimization of the C term, 
forces the total amount of correspondences to be iV. Thus the terms A and 
C will force A^ correspondences of strength 1 to be established in such a way 
that each feature of the first frame will tend to have a correspondence with 
a feature in the second frame and so that the correspondences will be evenly 
distributed among the features of the second frame. It follows that the process 
will tend not to leave any feature vmmatched. 

The F term is necessary to give a time constant for convergence of the 
network. We define the usual update laws 

Provided that A is large enough the variables Via will tend to be either 
or 1 and thvis it will tend to force a binary decision to determine whether a 
correspondence is to be established or not. 

We simulated this network on a Symbolics 3600 LISP machine. The 
results are described in detail in Grzywacz and Yuille (1986). To summa- 
rize them: despite extensive experimentation with the parameters the sys- 
tem rarely converged to the correct answer iinless given a hint of the correct 
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matches. The system made some interesting mistalces, it would sometimes 
choose matches which were almost rigid but which corresponded to compli- 
cated motion of the object between different frames. This suggested that 
rigidity alone was not a strong enough constraint and we should introduce 
another term in the energy function corresponding to smoothness of motion 
between frames. After some experimentation we fell back on the energy func- 
tion Emm used by UUman (1979) to solve the correspondence problem. 

j^MM=yf:f:v,M„. (4.7) 

,=1 a=l 

When we added the Emm term to the energy E the network gave con- 
sistently good results for a wide range of data Grzywacz and Yuille (1986). It 
gave a high percent of correct matches for systems of up to thirty points. 

To see what contribution the rigidity term made to the matching we 
removed it and ran the system using the minimal mapping term. The system 
gave identical results suggesting that the rigidity term was usually uneccessary 
for matching. A possible exception is at the occluding boundary of an object. 
Here the order of points can reverse between frames and the minimal mapping 
scheme gave incorrect results. For small angles and for some values of the 
parameters the rigidity term obtained the correct matching. 

In fact it is easy to see that, with the correct matching, the minimal 
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mapping energy will be zero for rigid motion in a straight line. This supports 
the idea that rigidity may be important for correspondence for rotational 
motions. We plan psychophysical experiments to investigate this case. 

In our simulations using the Emm term we did not try to optimize the 
parameters A,B,C,F,M and A in any sense. Instead we found that the 
asymptotic behavior of the system was the same for a large range of parameter 
values (few orders of magnitude). Typical values vised during the course of 
this research were A = B = 50000, C = 500000, F = 1,M = 50 and A = 1, 
where the distances between features in a given frame ranged from 1 to 10. 
We used homogeneous initial conditions for our simulations, i.e.: 

Via{t = 0) = ^. (4.8) 

We also tested our network on simple situations such as dot splitting, 
when there are two dots in the second frame equidistant from a dot in the 
first. Interestingly our network gave the correct psychophysical result; the 
dots split into two with Ki = 1/2. This suggests psychophysical experiments 
comparing the predictions of the networks to that of observers for other simple 
stimuli. 

There is an important practical advantage for V.L.S.I. circuits in only 
using the minimal mapping term for matching. It can be shown (Grzywacz 
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and Yuille 1986) that the MM term can be choosen to be Unear in the matching 
term,i.e. 

^MM = YEEl^ia^„. (4.9) 

,=1 a=l 

This gives an energy function 

. N N N r, N N N 

a=l i=l j=l j=l o=l '=1 

+f((Ei:^«-^))'+^MM (4-10) 

x=l o=l 

P N N 

This means that the c?ja's do not affect the connection strengths between 
elements. Thus the connection strengths will not have to be changed with 
each time frame. 

The energy fimction described above is rather different from those in the 
previous two sections. It looks considerably more complicated and a lot more 
care is needed to find the correct parameters. Unlike the previous problems the 
value of A is important, it has to be small compared to the other parameters 
of the problem. Finally the similarity of the energy function to that used by 
Hopfield and Tank for the Travelling Salesman Problem suggests it contains 
many local minima. Given all this, the success of Hopfield style networks 
for this problem is encouraging. Hopfield networks give progressively worse 
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results for T.S.P.'s with more than thirty cities. We can prove, however, 
(Grzywacz and Yuille 1986) that our network will always yield the correct 
answer if the motion between frames is svifHciently small compared to the 
seperation of the dots. Thus if time between frames is small enough the 
correct matching will be made. Other ways of dealing with large numbers of 
points will be discvissed in section 6. 

As for surface interpolation we can derive an analytic expression for the 
solution. Using the chain rule for differentiation we find 

dt ^ dVia dUia dUia ' ^ ' ^ 

ta 

From (4.4) we calculate 

dUia cosh^XUia 



(4.12) 



dVia 2X ' 

This term is always positive definite as it is boimded below by 1/2A. The 

energy is bounded below and so the system reaches a final state with dE/dt = 

0. Using (4.11) and (4.12) we see that a necessary and sufficient condition for 

such a state is 



dE 

= 0. (4.13) 



Inverting (4.4) 
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Differentiating (4.10) 

M: = ^^y(yr--+vr"'-2v,.HC(v-„H4.+pu..). (4.i5) 

Here we have introduced new notation. V = ^^^ Via, Vf^^ = J2a ^'<» 
and V^^^ = Si Via- We have also absorbed the constant M/2 into the 
definition of dia . 

Observe that (4.15) vanishes at Uia = ±oo. But we can show that these 
roots do not correspond to minima and hence are not solutions. To prove this 
we must show that the Hessian of E with respect to the Uia is not positive 
definite there. We calculate 



— = 2A , 4A 



-2XtanhXUia{AiVf''^ + V/^^^ - 2K„) + C(V -n) + d^^ + FUia)). (4.16) 

For large Uia the dominant term inside the bracket is —2XFUiatanhXUia. 
Thus d'^E/dUfg^ will tend to zero from below as Uia *-^ ±oo. Hence the Hessian 
cannot be positive definite there and the roots at infinity are not minima. 

Now consider the other roots of (4.15). These obey 
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A(V;-CO^ + y««^ - 2Via) + C{V -n) + 4^+ FUia = (4.17) 

and are also roots of dE/dVia = 0. At such points 

dUiadUjt dViadVjb dUia dUjb ^ ' ^ 

and so the Hessian of E with respect to Uia will be positive definite if and 

only if the Hessian of E with respect to Via is positive definite. We calculate 



d^E ^ F , , 



dViadVia V,kil-Vck) 



^ ^ =A + 2C (4.196) 



dViadVii, 
d^E 

dViadVja 



= A + C (4.19c) 



d^E 

which is positive definite. 

Thus the system will converge to a minima of the energy fimction which 
are given by 



jj.^ ^ 2C(n -V)-<f,,- AjVfo^ + Vr"^ - 2K„) ^^ ^o) 
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Unlike the surface interpolation case we have found no simple interpre- 
tation of these equations in general, although results can be obtained for 
particular cases (Grzywacz and Yuille 1986). Again notice that the final Via 
must lie between the limits and 1. 



5 Stereo. 

A natural way of writing stereo in the form of an energy function is as 
follows (Barnard 1986, Horn 1986, Gennert 1987) 



Eid) = J2{Li - i2.+d(.))' + f^Yl('^(i + 1) - d(i))\ (5.1) 

> t 

Here d(i) is the disparity between the images. Li and Ri are meastires of 

the left and right images and can be continuous or discrete. For example they 

could be the image intensities, or they could be the positions of zero crossings. 

The first term in (5.1) matches the left and right images in such a way as 

to minimize the disparity gradient represented by the second term. Line 

processors can be introduced to prevent the disparity gradients from becoming 
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too large and allows distinct objects to influence each others matching. *. This 
yields 



t t t 

(5.2) 
The /j performs two functions in (5.2). It prevents the disparity gradient 
from becoming too large but it also prevents matching in the first term if the 
difference between the images is too great. This latter effect may help deal 
with occluding situations when one eye sees a region which the other cannot. 
It is simple to generalize this energy function to two dimensions. 

It is straightforward to design a Hopfield net for this problem in the usual 
manner. We simulated this on two types of examples (in two dimensions). The 
first consisted on a sine wave with the central square displaced and the second 
was a standard random dot stereogram. The results were disappointing in 
both cases. Although it was possible to get roughly the right answer for the 
sine wave the random dot pattern gave consistently poor results. This occured 
despite filtering the images with a variety of filters, gaussians and difference 
of gaussians at various scales. 

We believe that this bad behaviour is due to the complicated structure 



* This work was done in collaboration with T. Poggio 
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of the energy as a function of d. The behaviour of this function is crucially 
dependent on the structure of R. For example if i? is a linear function of its 
arguments the energy will be quadratic in d and therefore well behaved. If R 
is more complex, in particular if it arises from a random dot stereogram, the 
energy function can be a complicated function of d with many local minima. 
To apply this method to a general scene would require smoothing the images 
(by an amount determined by pre-processing) until the function R was sufl5- 
ciently well behaved, avoiding smoothing the image too much to destroy its 
interesting featvires. While this is conceivable it seems that alternative stereo 
algorithms are likely to be more sucessful. 

A possible approach is to attempt to minimize (5.2) using other algo- 
rithms. Some success (Barnard 1986) has been reported for using simulated 
annealing for (5.1) (in two dimensions). So far attempts to use simulated 
annealing, and other stochastic methods, to minimze (5.2) have not been 
sucessful. Although it has been possible to hand tune the parameters to get 
the correct result for one class of stimuli, sign waves with displaced centres, 
it will not work on others, such as random dots. Various refinements of the 
methods have been tried including a coarse to fine strategy of convolving the 
image with a large scale gaussian, minimizing the energy at this scale and 
using this to guide the matching at smaller scales. 
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This result is negative and by no means conclusive. More sophisticated 
algorithms could be tried including perhaps methods of estimating the pa- 
rameters of the energy function directly from the image. Alternative energy 
functions could be tried. We believe, however, that this may represent a Umit 
for the practical use of energy functions. If an energy function is too compli- 
cated for straightforward algorithms to solve then the problem is badly posed 
and more heuristic methods should be used. This will be discussed further in 
the next section. 



6 Limitations of the energy function approach. 

The previous four sections described attempts at modelling problems in 
terms of energy functions using analog nets, The first three attempts were 
reasonably sucessfvil and the last one failed. We argued that this reflected the 
relative complexities of the energy functions being minimized. 

It is clearly possible to write any vision problem in terms of minimizing an 
energy function * . It is less clear that this is a good strategy. For non-trivial 



In the same way that all the laws of physics can be summarized in one equation 
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problems it often leads to energy functions with many local minima depending 
on large nvraibers of parameters. These parameters will often depend on the 
image being viewed and might even have different values in different regions 
of the image * . As yet there is no reliable way to estimate these parameters, 
or of minimizing energy functions with many local minima. 

Energy fimction methods seem a natural idea for matching representa- 
tions of images containing a large number of similar primitives. Random Dot 
stereograms (introduced by Julesz 1971) are ideally suited for this type of 
algorithm. Realistic images, however, contain many different features of vary- 
ing sizes. A good strategy for stereo could involve matching the most salient 
features and using this to guide the ambiguous features. The work on stereo 
by Mitchison and McKee (1987) shows that for one-dimensional sterograms 
the positions of the endpoints have an important effect on the matching of 
the interior points. Another interesting example of this type is psychophysics 
for motion correspondence illustated by Ramachandran's analogy of a mov- 
ing leopard (Ramachandran 1985, Ramachandran and Anstis 1983). If the 
outline of the leopard is not visible the leopard's spots are matched to nearby 
neighbours and no motion is seen. If the outline is also visible then its mo- 
by summing the squares of all the individual laws (Feynman 1963). 
* For example in the way the noise thresholds are determined locally for the Canny 
edge detector. 



36 

tion "captures" the spots and they are matched correctly. This suggests that 
random dot stereograms are a limited paradigm and that although humans 
have the capacity to match them correctly they may not be the strategy used 
for real images when more information is available. 

Moreover the energy function approach, at least in its most naive form, 
tends to ignore some of the structure of the problem. For example, the motion 
smoothing and segmentation algorithm described in section 3 would, in theory, 
be able to detect the boundaries of the leopard but it would not be able to 
use the speed of the boundary directly to influence the internal matching. 
There would certainly be an indirect influence, since the motion is required 
to be smooth, but this would be weak and depend on the distance from the 
boundary. One can contrast this with a more heuristic approach which would 
analyse the scene, detect the object boundaries, estimate their velocities and 
use this as initial data for matching the interior of the object. The algorithm 
could be designed in terms of several different networks connected together 
and certainly individual parts of this method could be implemented by energy 
functions. For example Grzywacz (private communication) has shown that the 
network described in section 4 will correctly match the leopard's spots if the 
estimated motion is available as initial data. 

The stereo energy function (5.2) also does not capture some of the im- 
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portant features of the problem. Places where the line-processors should be 
on correspond to boundaries of objects, and therefore should correspond to 
edges in the image. Thus it would be more sensible to find these edges by 
simple processing of the image rather than by minimizing (5.2). 

For any given module, such as stereo, there are many possible algorithms 
some matching image intensity, others matching edge-like featvires. The rela- 
tive effectiveness of these different algorithms will depend on the images being 
viewed, an edge-based stereo algorithm would be ineffective in a scene con- 
taining few or weak edges. For some scenes it would be natural to try to find 
and then match salient features, such as the occluding boundaries of objects 
or regions of high texture density. Heuristics like coarse to fine matching ,as 
used by Marr and Poggio (1979) for stereo, could also be used. For realistic 
images a stereo system might have to use a number of different algorithms, or 
submodules, interacting with one another and combining to give the solution. 
These submodules might each be implemented in terms of networks minimiz- 
ing energy functions but the most important part of the calculation, and the 
hardest part of designing such as system, would lie in the control strategy for 
combining the different submodules. 

Thus although minimizing energy fimctions is a useful technique for early 
vision it has definite limitations. If the energy has too many local minima it 
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may be better to try heuristic methods to avoid them rather than to vise 
compUcated search techniques. Minimizing an energy function is only one of 
the many different search strategies used in Artificial Intelligence research and 
is only effective for certain problems. 



7 Conclusion. 

We described how a number of problems in early vision could be de- 
scribed in terms of minimizing energy functions. We showed that Hopfield 
style networks were able to give fast reliable answers for some of these. We 
discussed the limitations of this approach. 
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Appendix 

A.l. Hopfleld's Formalism 

Hopfield describes a system of n neurons with output's V, and an external 
input li, where i runs from 1 to n. He defines an update rule by 



^^^ = E^o-^i-^+^^ u-i-i) 



Here Tij is a measure of the strength of the connection between neuron 
i and neuron j. A priori, every Tij can be positive, negative or zero. C, is 
a capacitance and Ri = 'i-/J2j'^ij ^ resistance associated with every neuron. 
Ui represents some internal function of the neuron i, for example its somatic 
potential, and is given by 



Ui = 9-\Vi) (A.1.2) 

where g{x) is a monotonic increasing, but bounded function. This model 
is deterministic, its final state will depend on the initial conditions and the 
inputs Jj. Hopfield argues that it embodies a content addressable memory. 
To do this he defines an "energy" function 
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with the restrictions Tij = Tji and Tu = 0. Differentiating (A. 1.3) using 
the chain rule and substituting (A. 1.1) and (A. 1.2) we have 



f = -Ec^^r'W)(f)^ (A.1.4) 



which is always negative. E is hence a Lyaponov function and by standard 
results of Statistical Mechanics any solution to (A. 1.1) will converge to one of 
a fixed number of stable points, provided E is bounded below. The precise 
fixed point (A. 1.1) converges to will depend on the initial conditions and the 
external inputs. The potential of the system contains a large number of local 
minima and the minima the system ends up in is determined by the initial 
conditions. The system can therefore be thought of as a content addressable 
memory. The requirements that the connectivity matrix T is symmetric is 
needed in order to assure that the update of Ui has the form of equation 
(A.1.1). 

A. 2 Extending Hopfield's networks: We will now proceed to propose 
a more general class of networks and update rules, The essence of Hopfield's 
networks can be described as follows. First we define an "energy" function 
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E = E(Vi). (A.2.1) 

We should emphasize the quotation marks round "energy". It is not 
necessarily the energy of the physical system but merely a Lyapunov function. 
It is bounded below. We will reformulate our update rule as 

^J1--TA-^ (A22) 

3 ■' 

This is a generalization of (A. 1.1). In order for £J to be a Lyapimov 
function, its temporal derivative must be everywhere negative or zero. In 
other words, the "energy" must always decrease or at most remain constant, 
but never increase. Differentiating (A. 2.1) using (A. 2.2) yields 

f!^-_y-^-i^f[Ii-_V^-^^ U23^ 

dt ~ ^^'> dt dt ~ 2^^'^dVidvi ^^"^-^^ 

£^ is a Lyapunov function (and hence the system has a content addressible 
memory) if and only if ^jj is positive definite, i.e. x'^Ax > for all vectors x. 
For any arbitrary function E there are therefore an infinite number of possible 
updating rules and so an infinite number of possible systems. 

It is straightforward to check that Hopfields network defined by (A. 1.1), 
(A. 1.2) and (A. 1.3) can be obtained by setting 
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A-- = Cigr"{Vi)6ij. (A.2.4) 

We will show in A.3 that Hopfield's energy function will still be a Lya- 
punov function if we set 



A-/ = Cig-''(Vi)liSij, (A.2.5) 

where the Ij are an arbitrary set of positive numbers. This enables us to 
relax the symmetry constraint on the T,-,- to 



Tijh = Tjilj, {A.2.6) 

where there is no summation over the indices. We prove in the next 
section that for an n x n matrix this gives us an additional n — 1 degrees of 
freedom. 

It is important to note that the number of constraints is less important 
than the form of the constraints. In general, the matrix Tij will be sparse, 
since most connections do not exist. This becomes especially true if n is 
large. The /,'s can then be applied to the remaining non-zero Tjj. Hopfield's 
symmetry conditions implies, that once the top left half of T,j is specified, 
the bottom half will be completely determined, since Tij = Tji, independent 
of the number of zeros in T,y. In our extension, however, n — 1 of the non- 
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zero entries can be specified at will, as long as Tij and Tji have the same 
sign. This condition prohibits the use local inhibitory interneurons, that is 
neurons which are excited {Tij > 0) but which inhibit in turn (Tji < 0). If the 
network is used as a content addressible network without allowing too many 
errors during "recall", only « 6% of the possible ri^ - n entries are different 
from zero (Hopfield, 1982). Thtos, for n = 30 and using Hopfield's symmetric 
network, only about 26 T,j's can be specified while the remaining ones are 
fixed. Our extension implies, however, that all Tji can take on arbitrary 
values — as long as the sign of transposed elements is the same. Notice that 
these conditions in no way constrain the diagonal terms Tj,-. * 

A.3 The full extension: 

Formally, the integrability condition (A.2.1) in E is analogous to the ex- 
istence of potentials, or state functions, in Thermodynamics. If this condition 
is not met, then the value of E depends on the path taken in Vj space and 
hence cannot be a Lyapunov function. 

There are many other ways of constructing Lyapunov functions for a 
system with a given update rule. 

In order to study the class of connectivity matrices Tij leading to con- 
verging behaviour, we will now express Hopfield's update rule as 



*An alternative to our proof is scaling «, and Vj- independently (Hopfield, private 
communication). 
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By inverting equation (A. 2. 2) we can define the appropriate Lyapunov 



function 



i = -E^r/f (^.3.2) 



or, 



E(Vi) = -fj2^^^dVj. (A.3.3) 



Substituting from (A.3.1) gives an integrand / 



If this expression is integrable, that is its value is independent of the 
path along which the integral was evaluated, then E is well-defined and a 
Lyapunov function. A function I is integrable if, and only if, dl = where 
d is the exterior derivative operator (Misner, Thorne, Wheeler, 1977). Define 
Bij and hi by AJ'^ = Cig~^'(Vi)Bij and hi = g~^{Vi)/Ri. The integrand is 
then 



^ = E BijTikVkdVj - J2 BijhidVj - Y, BijIidVj. (A.3.5) 
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If Bij and Tij are independent of the Vj's then / is integrable provided 



dI=J2 BkjTkidVi X dVj - Y^ Bij ^dVi x dVj = 0. (>1.3.6) 

Since dVi xdVj is antisymmetric in i and j this will hold provided Y^^ BkjTki + 
Bijj^ is symmetric in i and j. If /ij is non-Hnear this will only be possible 
provided Bij does not contain off-diagonal terms. Thvis, with 



Bij = liSij, (A.3.7) 

where the /j's are positive numbers, equation (A. 3. 4) changes into 



J2^iSii(£TikVk - ^^^ - Ii)dVj. {A.Z.S) 



97\Vi) 

t,] k 

If UTij is symmetric, then (A. 3.8) can be integrated since it consists only 
of terms Hke d(ViVj) and g~^{Vi)dVi which are well-defined. Thus, we can 
generalize Hopfields's result to all matrices Tij, provided /,T,j is symmetric 
and Aij is positive definite. Following (A.3.7), matrix A must be diagonal 
and therefore its eigenvectors are equal to the /,'s, each of which can be any 
arbitrary positive numbers. Whatever the values of the /,'s there are definite 
relations which must hold between the Tij. In particular, for all i,j,k, we 
have 
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|.^.^ = l. (..3.9») 



with the auxihary conditions 



Tij ■ Tji > 0. (A.3.96) 

More compHcated relations can be deduced but they can all be obtained 
by combining relations of type (A. 3. 9). 

We now examine the constraints in more detail. Suppose we specify the 
values Tij, i < j in the upper right half of the matrix. Then the values in the 
lower left half are given by 

Tji = Tij f. (A3.10) 

We will ignore the case when all elements in colvunn j are zero (and 
therefore also all elements in row j), since the order of the matrix will simply 
be reduced by one. Otherwise the lower left half of the matrix is determined 
by the h/lj. Now all the U/lj can be determined from a basic n — \ elements 



^' ^' ^"-^ (^.3.11) 



h h in 



dE V- dE dVj ^ , . , 
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for all motions. Requiring the update rule to be related to E by (A. 3.2) 

will ensure (A.3.6). There are many other ways of enforcing (A.3.12). For 

Hopfield's updating rule, however, we have been tinable to find any other 

integrable Lyapunov function. 
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